An optimal network for passenger traffic 



ON 
O 

o 

(N 
^3 



A.K. Nandi, K. Bhattacharya and S.S. Manna 
Satyendra Nath Base National Centre for Basic Sciences Block-JD, Sector-Ill, Salt Lake, Kolkata-700098, India 

The optimal solution of an inter-city passenger transport network has been studied using Zipf 's 
law for the city populations and the Gravity law describing the fluxes of inter-city passenger traffic. 
Assuming a fixed value for the cost of transport per person per kilometer we observe that while the 
total traffic cost decreases, the total wiring cost increases with the density of links. As a result the 
total cost to maintain the traffic distribution is optimal at a certain link density which vanishes on 
increasing the network size. At a finite link density the network is scale- free. Using this model the 
air-route network of India has been generated and an one-to-one comparison of the nodal degree 
values with the real network has been made. 

PACS numbers: 89.75.Hc, 89.75.Fb, 05.60.-k 45.10.Db 
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The identification of certain crucial controlling param- 
eters that ensure the characteristic structure of a random 
network, be it a network that has been created by a nat- 
ural process or a network that has evolved due to the 
social requirements, has been a focal point of research 
interest for quite some time P, 0, 0, For example, 
different algorithms have been proposed to generate the 
well known scale-free structures of highly heterogeneous 
networks which successfully reproduce the statistical fea- 
tures of important networks like the Internet World 
Wide Web and airport networks etc. 

On the other hand, not much attention has been paid in 
reproducing the structural features specific to a particular 
network and to making a one-to-one comparison of the 
real and the model networks. Intuitively it is evident that 
such a modeling would need information specific to such 
a network. In this paper we argue that for a network of 
passenger traffic it is possible to construct an optimized 
model network of this kind using only two ingredients, 
namely the node-wise population distribution as well as 
a guiding rule for the passenger traffic flows. 

A transport network should be efficient as well as cost 
effective. Efficiency is ensured when the communication 
between an arbitrary pair of nodes takes only a finite 
and short duration even when the network is very large. 
This implies that the network must be characterized by 
'small-world' features. In addition the network should 
be robust with respect to random failures. If a link is 
down, the transport process should not be grossly af- 
fected. This implies that the network must not have a 
tree structure which is most economic but has extreme 
sensitivity to failures. In practice the network should be 
such that when the flow is not possible along a certain 
path, there must exist alternate paths, even of longer 
lengths, to maintain the flow. Indeed real-world trans- 
port networks are never like tree graphs. Actually they 
have multiple loops of many different length scales and 
therefore they are hardly affected by random link or node 
failures. A prominent example of this is the Internet 
and its robustness to random failures in its structure is 
quite well known Q. Secondly, the laying cost of the 
network is another controlling factor. If every node is 



connected to all other nodes it would be excellent, but 
that would involve large establishment and maintenance 
cost. Planners and administrators of railway networks, 
city bus transport systems, or even postal networks es- 
tablish and upgrade their networks keeping mainly these 
two aspects in mind. 

Recently, optimal networks embedded in Euclidean 
space have attracted much attention. Given a spatial 
distribution of human population the locations of the dif- 
ferent facilities so that the mean distance is a minimum 
was discussed in Ja. Signatures of topology and patterns 
are explored in dOj. A minimal spanning tree structure 
of the optimal network was proposed in 11 1 . 



Here we study a model network for the passenger traffic 
among different cities. We ask if, given the populations 
and locations of all cities in a country, can one predict 
the structure of the network that is optimized with re- 
spect to the connection robustness and wiring cost? Our 
study is based on the framework of Zipf 's law 1^, llSf of 
city population distribution and the Gravity law |14l | of 
social and economic sciences describing the strength of 
the passenger traffic between a pair of cities. Finally, we 
apply this scheme to the Indian air traffic network, which 
gives good correspondence with the real network. 

Zipf's law for the frequency of occurrence of English 
words has also been applied to the rank-size distribution 
of city populations [12l | . In a country the maximally pop- 
ulated city is assigned rank 1, the second largest is put 
in rank 2, etc. It is known that the population size p{r) 
varies inversely with rank r. This also implies that the 
population distribution is a power law [13]. 

The passenger traffic among N different cities and 
towns in a country is given by the well-established Grav- 
ity model [l^ . In its introductory form the magnitude 
of the passenger flow from city i to city j is jointly pro- 
portional to their individual populations pi and pj and at 
the same time is penalized by an inversely proportional 
factor which is the square of their distance of separation 
oc piPj/ifj. This equation has been generalized 
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FIG. 1: Finite-size scaling analysis network cost functions 
demonstrated by the collapse of the data for network sizes A'^ 
= 128, 256 and 512: (a) the total wiring cost CnetN~^ scales 
with the link density p, (b) the total travel cost CtraN^'^^ 
to maintain the traffic scales with pN and (c) the total cost 
(C„et + lQNCtTa)N~°-^ also scales with pN . 



to the following asymmetric parametric form [l| 
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where a, P and 6 are suitable parameters and k runs over 
all — 1 nodes except i. 

While applying these two laws we assume that not only 
the total population of the country is conserved but also 
the individual city populations remain constant. More 
specifically, we assume that in a certain unit of time Fij 
tourists travel from city i to city j but they eventually 
return to their own city i within the same time interval. 
Of course there are a few who migrate from one city to 
the other and start living there, but their number must 
be very small compared to the tourist traffic, and we ig- 
nore this component of migratory flow. Therefore neither 
the city populations nor the inter-city traffic flow changes 
with time. It seems that our model should also be quite 
appropriate for a postal distribution network. 

In a simple model we take a unit square box on the 
X ~ y plane to represent the country and N points dis- 
tributed at random positions within the box as the lo- 
cations of different cities. Though the periodic bound- 
ary condition has no physical meaning in this context we 




FIG. 2: The nodal degree distribution P{k, N) vs. k of the 
optimized network at the link density p — 0.1. (a) The distri- 
bution for network sizes A'^ = 128, 256 and 512. (b) Finite size 
scaling of the same distributions gives a value of the exponent 
7 = 1. 



use it along both the transverse directions on the box 
to make the data more well behaved. Given the set of 
coordinates of N points {xi,yi},i = l,N, all inter-city 
distances £ij are determined. Cities are then assigned 
populations Pi, (« = 1,^) (in real numbers) by draw- 
ing them from a power law distribution Prob(p) ~ 
with fi = I as per Zipf's law. Using Pmin = 0.001 and 
Pmax — 1, the city populations are generated using the 
relation p = -prai-a [Pmax/PminY^ , whcrc ri is a uniformly 
distributed random fraction and finally normalized such 
that HiPi = 1. 

Knowing the values for the city populations pi and the 
mutual inter-city distances, the magnitudes of passenger 
fluxes Fij and FjiS are calculated using Eqn. (1) and for a 
certain set of values of a, (3 and 0. By definition this flow 
pattern is inherently directed. However, we consider only 
an undirected traffic flux between i and j by considering 
the net flow Fij = Fij -\- Fji . Let at some arbitrary inter- 
mediate stage all TV cities be linked by a singly connected 
network. Since nodes are randomly distributed on a con- 
tinuous plane, there exists one and only one shortest path 
between a pair of nodes. We assume that the entire flow 
Fij passes through the shortest path on the network con- 
necting the nodes i and j and therefore each link on this 
path is assigned Fij. When this assignment process has 
been completed for all distinct N{N—l)/2 node pairs, the 
net flow through a link measures the net traffic w through 
that link. The quantity w is like a weighted betweenness 
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FIG. 3: Air route connection network of 80 civilian airports 
in India, (a) The real air route network of L —265 links con- 
necting different pairs of airports by all 12 airline companies 
active in India, (b) The network obtained from our model us- 
ing 2001 census data for the city populations of the associated 
Indian cities and using the Gravity law with a — (3 — 1 and 
9 = 2. This network also has 265 links. 
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centrality and the net traffic along a link connecting a 
pair of nodes i' and j' is Wi'ji — T,Fij, where the summa- 
tion is taken over the subset of N{N — l)/2 node pairs 
whose shortest paths pass through the link {i'j'}- On a 
graph having loops the shortest paths are found using the 
well-known Dijkstra algorithm [l6| . 

The cost function for this traffic distribution has two 
competing factors. Given a network there is a cost to 
maintain the traffic along every link. We assume a fixed 
value for the cost to transport a unit population through 
unit distance along every link of the network (e.g., per 
person per kilometer). Therefore if Wij is the net flow 
between the two end nodes i and j of a link of length iij 
then the total cost involved to maintain the entire traffic 
flow is Ctra — ^i=ijWijiijaij. The second factor is the 
establishment cost to construct the connections, which 
is, Cnet — ^i^j^ijCi^ij • Here, the ciijS are the elements of 
the adjacency matrix and = 1 if there exists a link 
between i and j otherwise it is 0. Therefore the total 
cost function to maintain the whole traffic distribution of 
the network is the sum of these two factors: 

C = Cnet + ACfra — ^i^ji^ + Xwij)iijaij (2) 

where A is a type of conversion factor that makes an 
equivalence between the two types of cost. 

The Minimal Spanning Tree (MST) graph covering all 
N nodes using the Euclidean distances £ij as the link 
weights has the minimal value of the networking cost Cnet ■ 
Using Kruskal's algorithm to generate the MST, the 
whole set of iV(iV — l)/2 links is arranged in a sequence 
of increasing lengths. Links are then dropped one by 
one from this sequence, starting from the minimal length. 
Links which form loops are rejected. This is checked by 
the well-known Hoshen-Kopelman algorithm in Percola- 



FIG. 4: (a) The degree k{r) of an airport in India has been 
plotted with its rank r in the population; real data in open 
circles and model data in filled circles. Average slopes are 
shown by the dashed line and the solid line for the real and 
model data, respectively, (b) The one-to-one comparison of 
the nodal degree values using the kfactor{r) between the real 
and the model networks. This factor has been plotted with 
the nodal rank r. 

tion Theory ^1^. The MST is obtained when one suc- 
cessfully places iV — 1 links. 

To construct the optimal network for passenger traffic 
we start by constructing the MST as described above, 
connecting all the N nodes and the cost components Ctraj 
Cnet and C for this network are calculated. Obviously the 
optimized network cannot have a tree structure where 
the typical distance between an arbitrary node pair is 
much larger. Therefore we drop additional links onto 
the graph to minimize Ctra using the following procedure. 
First, all node pairs which are not linked are sorted out. 
Our strategy is to pick up the particular unlinked node 
pair which if connected decreases Ctra by the maximal 
amount. If the length of the shortest path measured on 
the network between a typical pair of unlinked nodes i 
and j is dij then the reduction in travel cost when the 
node pair is connected directly by a link of length iij is 
approximately ACtra = {dij —(■ij)Fij. This difference has 
been calculated for all unlinked node pairs in a similar 
way. We select the particular node pair for which ACtra 
is maximum and link them. After that we recalculate all 
shortest paths and Wij values afresh and repeat the whole 
process to add the next link. In this way links are added 
one by one and values of both the cost components and 
the total cost are measured with increasing link density. 
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We first analyze the finite size scaling of tlie varia- 
tion of different cost components with the link density 
p{N,L) = 2L/{N{N - 1)} using a = /3 = 1 and 61 = 2, 
with L being the number of links of the network. In Fig. 
1(a) the scaled total wiring cost CnetN~'^ increases mono- 
tonically with the link density p for all N . In Fig. 1(b) 
the total traffic cost initially decreases very sharply but 
then almost saturates and CtraN^'^^ scales with pN . Fi- 
nally, to obtain a scaling of the total cost function we 
have to use the conversion factor A = lOiV and define 
C = Cnet + lOA^Ctro, which scales as (Fig. 1(c)) 
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where J-{x) is a scaling function independent of N. The 
total cost so defined first decreases and then increases 
and has a distinct minimum at a specific link density pc- 
We call the network at this link density pc the globally 
optimized traffic solution with respect to the total cost. 

The nodal degree distribution has been calculated on 
the optimized network at the link density pc. From Fig. 
1(c) we see that pc ^ as iV — > oo. Consequently the 
degree distribution at p — pc is nearly the same as the 
Poisson distribution, the degree distribution of the MST. 
However, for finite link density the network becomes even 
more heterogeneous. This is manifested by the appear- 
ance of many small degree nodes and few large degree 
hubs. In Fig. 2(a) we have plotted the degree distribu- 
tion P{k, N) for three different network sizes but at the 
same link density. The finite size scaling has been shown 
in Fig. 2(b) as 
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where rj — 1 and ^ = 1 are used to obtain the best 
data collapse giving the degree distribution exponent 
7 = ry/C = 1. This clearly shows that at finite link den- 
sity the resulting network is a scale-free network. At a 
different link density the exponent value is the same and 
the scaling range increases with the density. It is also ob- 
served that the model network is a small-world network 
from the density pc and beyond. It has already been re- 
ported that the air-route network of India has a scale-free 
structure [l9|. 

We apply our model to the air-traffic network in India. 
There are iV=80 civilian airports in India for passenger 
transport. Considering all 12 airline companies that 
are active in the air-space of India there are L = 265 
links connecting different pairs of airports 2lj. Every 
airport is associated with a nearby city or town. We 
have collected the 2001 census (in India a general census 
is done every ten years) data for the populations of these 
cities from the website 2^ . We have checked that indeed 
the available data for the top 185 populated Indian cities 
do follow Zipf 's law quite closely. The mutual distances 
of separation among these cities are calculated by the arc 
lengths of the great circles on the surface of the Earth 
joining pairs of cities. The latitudes (j) and longitudes ip 



of these cities are available at the website [23'] . Knowing 
these angles, the inter-city distances are obtained from 

iij — RECos~^{[Sin{(j)i).Sin{(j)j)] 
+ [Cos{(f>i).Cos{(l)j).Cos{'ip^ - ipj)]) 

where Re — 3962.6 miles is the Earth's radius and the 
angles are measured in radians. As per our scheme we 
start by constructing an MST using £ij as the weights. 
The total of 265 links are then dropped one by one using 
the optimized procedure described above until the link 
density w 0.084 in the real network is reached. Once the 
fully connected network has been formed we draw the 
airline networks for both the real as well as model data 
in Fig. 3(a) and 3(b), respectively. The visual similar- 
ity of these two figures is quite striking. However, on a 
closer look it seems that the links are more dense for the 
real network, though this is not true. Though the model 
network has exactly the same number of links as the real 
network, the longer links are more in the real network. 
This is because we started from an MST, and shorter 
links are selected in the very starting configuration. 

For both networks we measure the nodal degree ki and 
the rank r of every node. The degree vs. rank plot of 
the 80 node networks of India are compared between the 
real and model networks in Fig. 4(a). The comparison is 
good, and both seem to obey a power law of k{r) r^" 
with f = 0.67 and 0.75 for the real and model networks. 
Secondly, we study a one-to-one comparison of the real 
and model networks. For this purpose we calculate the 
factor kf actor (r) = {kmodei{r)-kreai{r))/kreai{r) and plot 
it with the rank r in Fig. 4(b). We observe that kf actor {r) 
remains limited within ±2 for 80% of the nodes. We con- 
clude that the correspondence between the real and the 
model networks is quite good. To explore the dependence 
of our results on the parameter values we looked at the 
variation of the quantity x = ^iik'^odcii^) " Kcaii^)? 
within the range of a = /3 = 1 ± 0.5 and 9 — 2± 0.5, and 
observed around 10-15% variation. 

Finally, we would like to make three comments, (i) 
There could be a number of ways in which our study 
can be improved. The population in every city is dis- 
tributed within a wide range of economic strengths. Only 
a fraction of population at the top edge of the distribu- 
tion generally has access to air travel. Therefore perhaps 
it is better to replace the city population in our study 
by the number of people who are richer than a certain 
cut-off mark. We thought that an approximate estimate 
of this number would be the number of Income Tax pay- 
ers. Using these numbers may make the analysis better 
provided they are not proportional to the city popula- 
tions. Unfortunately we could not get city- wise statistics 
of Income Tax payers. Also, a city airport serves the 
neighboring towns and suburbs as well. Therefore an ef- 
fective population of the city including its surroundings 
may be considered, (ii) This point, which is very inter- 
esting, was raised by a referee. In Fig. 4(b) there is a 



5 



point whose k factor is 9. Such a high value imphes that 
the model degree is 10 times that of the real degree. We 
identified the city as Kanpur, which has a population of 
Ri2.7 million and occupies the 10-th position in the rank 
distribution. In spite of such a high population, Kanpur 
airport is connected only to Delhi and no other airport in 
the country, leading to its real degree being unity, but our 
model predicts it to be 10. A search on the Internet [l^] 
reveals that indeed the traffic in Kanpur has increased 
to a high extent in recent years; there are modernization 
plans from the government and many other airlines are 
also planning to operate there. Therefore it is expected 
that the degree of Kanpur airport will eventually increase 
in the near future. This tendency has been correctly pre- 
dicted in our model, (iii) The third point has also been 
raised by another referee. It may be interesting to ex- 
plore if our method can be applied to the Indian Railway 
network as well. Recently it has been observed that such 
a network has the small- world property The main 

difference is that a large number of stations in a railway 
network are intermediate stations having two stations on 
the opposite sides. Therefore in the framework of our 
study two cities should be linked only if there is at least 
one train that starts at one city and finishes its journey 
on the other. Perhaps we will take up this study in a 
future project. 

To summarize, we present evidence that a precise one- 
to-one reproduction of a network is possible once the key 
ingredients controlling the structure of the network are 
identified. We justify this claim by constructing an op- 
timized transport network of inter-city traffic in which 
traffic is controlled by both the city populations (Zipf's 
law) and by the rule of traffic distribution (Gravity law). 
The total cost function is determined by two competing 
factors, i.e., the cost of maintaining the traffic and the 
establishment cost of the network. This procedure has 
been applied to the airport network of India having 80 
civilian airports, and a node-to-node comparison of the 
nodal degree values between the real network and the 
model network is made. The correspondence is found to 
be very good. 

E-mail: manna@bose.res.in 
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